GSoC & Open Source

~ In this article I'll be talking about my experience with GSoC and open source and why you should try it out too.~

A little bit about myself

Before diving into this blog post, allow me to introduce myself. I am a Mebin Thattil, a student at PES University, Bangalore. I'm part of HSP, PIL and ACM. Over the summer after my first year, I was a part of the Google Summer of Code (GSoC) program. Before GSoC, I didn't have a lot of experience with open source. Now don't let the creation date of my Github account fool you, even though I have had my account since 2018, I haven't contributed to any major open source projects.

The reason I wanted to give this introduction is so that anyone reading this understands that if I can do it, so can you. Don't create an artificial barrier in your mind and shy away from GSoC / open source contributions thinking that you're not good enough. I personally had this notion in mind and felt that I wasn't good enough to contribute to open source. I always thought GSoC is something that I could try in my second year and if I don't get accepted, I'll try again the following year. But fortunately turns out my assumptions were wrong. It was my seniors who encouraged me to try GSoC and made me realize that I had nothing to lose by trying. So, I decided to give it a shot and I'm glad I did!

Setting context: What's open source?

Open source is a pretty self explanatory term. It's a way to share your work with the world, and encouraging others to build on top of that. A lot of today's modern software would be impossible without open source, since a lot of the tools they use under the hood are open source and not using them would mean re-inventing the wheel over and over again, and doing so shabbily.

Let me explain the "shabby re-invention of the wheel" in little more detail. Say you're working for a company that is going to make a video editing software. Now writing a lot of the under-the-hood tools for this like video convertors would be a pain. And considering that your main target is to get the video editing software out soon means that you can only possibly allocate so little time to development of tools like video converters, encoders, decoders, etc. So you would naturally end up doing a bad job or not handle edge cases or skip optimizations.

Here comes the true beauty of open source, you could use a project like FFMPEG and get a lot of the under the hood heavy lifting done. FFMPEG is developed separately and does not have the same hard target deadlines like you have, to complete and ship software. So there is naturally a good chance that the code written is good code. Not to mention the entire world is your code-reviewer. This ensures that the code written is of good quality, efficient and safe.

Lucky for you, you live in a world where open source exists, so you choose to use FFMPEG. But after using FFMPEG, you realize there are some additional features you would need. Open source gives you the freedom to build on top of tools like this and add the features you want!

Now that you're all convinced that open source makes the world a better place, let me explain what GSoC is.

What is GSoC and why should you care?

GSoC, or Google Summer of Code, is a program that Google offers to students to get involved in open source projects. It is a great opportunity to learn about open source and contribute to it. It is also a great way to get exposure to the open source community and meet like minded people and build a great network.

All the work you do during GSoC is open source. Having worked as a GSoC contributor looks good on your resume too! Having google's tag on your resume makes it stand out in a good way. It's also a good indication to recruiters that you know how to work in a team setting and write good code and have real industrial experience.

GSoC also pays pretty well. You could earn anywhere from $750 to $6000. The stipend depends on the size of the project and which country you reside in during the time you do your project. Stipends are paid by google and not by the organization. So in a way, google acts like a marketplace for organizations and students. Google pays stipends not only to the contributors but also to the organizations that the contributors work for!

What can GSoC teach you?

Working on a personal project is vastly different from how code is written in a company and team setting.

The first major challenge faced was to read and make sense of huge codebases. While writing code for a personal / college project, you always start with a clean slate. This means that you write code more than you read. But this is reversed when working with an open source project or even as an employee of an organization. There is a good chance you will have to build something on top of a pre-existing product, and for this the ability to read and scan through huge codebases really helps.
Navigating through huge codebases was initially very intimidating to me, but the longer I spent with the codebase the easier it got. And soon enough I reached a point where I was comfortable with the codebase and was able to build features on top of it.

Another big change is making sure any additional code you write is well documented. Tomorrow people are going to build things on top of yours, so it's your responsibility to write code that is documented enough that it makes building on top of your code a breeze!

Working in an organization also forces you to think a lot from the end-user's perspective and make sure you include all kinds of users and attend to their differing requirements.
I would like to expand upon this by taking an example: I built a finance tracker last year, essentially it parses my bank statement and categorizes my finances. Now I only wrote it to parse only SBI bank statements, because that's where my account is. So I need not be bothered about the other banks. But if I were doing something like this as part of my GSoC project or while working for an org/company, I would have to make sure that I include all users from all possible banks. Also, my finance app is mostly CLI based, while it's fine for me, it would be intimidating for a lot of other non-tech savvy people. So here I'm intentionally cutting off a group of people that can use my product (because I can), but if this were done as part of GSoC or any organization / company it becomes a priority that we take care of this and prioritize inclusivity as well.

Another big one is effective communication. As a global program, GSoC often involves working with mentors and contributors across different time zones. This limits the opportunity for real-time, back-and-forth conversations, making it essential to communicate clearly and concisely. Without this clarity, miscommunication can lead to delays and wasted effort. Unlike traditional offline internships or jobs, GSoC is entirely remote and relies heavily on asynchronous channels like email, GitHub, and Matrix. As a result, strong written communication skills is crucial.

Those were just some of my major learnings, there are many more that I have not covered here.

My GSoC application and journey

My organization - Sugarlabs

The organization I contributed to over the summer was SugarLabs. SugarLabs is a global non-profit organization with a mission close to my heart - improving accessible education for children globally through technology. SugarLabs is best known for it's Sugar learning platform. Sugar is what powers the One Laptop Per Child (OLPC) laptops. Sugar is essentially a desktop environment thats sits on top of Linux and has a bunch of activities that can run within it.
The first time I heard about Sugarlabs was while going through the orgs page on GSoC. Prior to that, I had no idea about Sugarlabs.

One piece of advice I’d give to anyone applying for GSoC is to choose an organization whose work genuinely interests you or makes a meaningful impact. Passion and curiosity are your greatest strengths. Try to find a project you’d be excited to work on even if you don't get paid. This mindset can make a big difference, and I truly can’t stress it enough.

When and how I contributed

I started looking into GSoC / orgs around the second week of February, which is a few weeks prior to when the orgs were released. I spent a good few days setting up and just tinkering around with Sugar, trying to use different activities. This helped me understand what the product was good at and what could be improved.

Now let's talk contributions, when you are new to the codebase and community, it's easier to start contributing by picking up issues with the "good first issue" tag. Another way to make contributions could be simple quality of life improvements to the service, so say something like having keyboard shortcuts for few things that are repeatedly accessed (this was one of my PRs pre-GSoC xD).

Once you have familiarized yourself with the community and codebase, you can start to go through the organization's GSoC idea list (if they have one). This is a list of projects that orgs usually put out, detailing the projects they would like contributors to pick up during the summer. You are still free to propose your own idea, but that's usually harder, as you would then also have to find a mentor and convince them to mentor you over the summer for the same.

Something that helped my application was the fact that I had an almost working demo of the project I wanted to work on. This gave both me and my mentors confidence that I would be able to do this over the summer.

So, before the selections I had raised about 4 PRs and worked on things like the demo (for which PRs were not raised).

My project & proposal

My GSoC proposal was based on an idea from the organization's project idea list. The proposal was to modernize the Speak activity by swapping out the old robotic speech synthesis to a more natural sounding TTS model, and also use a combination of a Small Language Model (SLM) running locally and Large Language Model (LLM) hosted on the cloud to power the chatbot mode.

Here are some of the challenging and interesting problems to solve as part of this:

Like I mentioned before Sugar is what runs on the OLPC laptops. The first generation of OLPC laptops only has 256MB of system ram! After running linux and Sugar on top of that, there is hardly any headroom for an activity. So running a SLM locally on such a tight hardware constraints was a real challenge.
To make matters worse, SLM is not the only thing that needs to run locally, the TTS model also has to run locally! Modifications had to also be made to the audio pipeline to enable optimizations such as streaming audio from the model to the pre-existing audio pipeline.
Distribution / creation of API keys for sending requests to the LLM hosted on the cloud was another thing to think about. You will need to have an API key so nobody exploits your server, but at the same time you would ideally want to distribute and package it within the activity. But it being open source means that people would be able to see and read the API key. So implementing this in the most frictionless way possible was something to think about.

Project development and things I built along the way

Before implementing things in my GSoC project, the first few weeks was spent testing and benchmarking different models to see which worked best for our use-case.

SLM Benchmarking

By far the trickiest one to benchmark was the SLM. For the SLM it would be ideal to start with a really small model and then work our way from there by quantizing it and converting to other formats like GGUF for performance benefits. So initially after some research I started working with the TinyLLaMA 1.1B, after fine-tuning, quantizing and converting to GGUF, the model was still too big for our usecase and did not perform well. I then tried the same thing on many other models, I did this process of fine-tuning, quantizing and converting to GGUF so many times that I eventually created a script to automate this xD.

The size for these models were still too large, I then realized there are two ways I could go about this, either quantize even more aggressively or pick an even smaller model. I initially tried the first approach but heavy quantization took the model's responses from bad to awful.

So then I decided to give even smaller models a shot. So I tried my luck with LLaMA-135M, after repeating the process of FT, quantization and convert to GGUF, this model was pretty light weight, but the responses were still pretty bad, but not awful. So then I tried to fine-tune it differently and after that was done, I had multiple models to compare from, all of which can be found on my HuggingFace. Now to review and grade the model responses I cannot expect my mentors and community members to run all these 16 models locally one by one, so that's where I created a SLM benchmark streamlit app to compare different responses.

TTS Benchmarking

TTS was much easier to benchmark and find the right model. After testing a couple of models we landed on Kokoro. Kokoro was lightweight enough, open source, supported multiple different languages and also had the ability to mix multiple stock voices to create new voices. This basically enabled us to create infinite voices.

To test out these voices and get feedback from mentors and the community I created another streamlit site to test out different kokoro voices and try out the mixing and matching of voices. We also had to decide upon 5-6 voices as 'default voices' since each voice is around 0.5MB in size and we want the activity to be as lightweight as possible. The rest of the voices could be downloaded from within the activity later if needed, and the voices would be pulled via hugging face hub.

LLM Benchmarking

This was also fairly easy as most of the models performed pretty well out of the box, the only thing that needed to be taken care of here was the model guard-rails and profanity filters and checks. However a benchmark was still made so people could see how the model performed.

Once the benchmark was done, I started working on the actual implementation of the activity.

Working on the TTS

The challenge here was to get Kokoro to stream audio into the pre-existing audio pipeline. I don't want to get too much into the technical details of this (but if you're interested take a look at my GSoC weekly blogs on my site), but a lot of re-sampling and other optimizations were needed to make the previous features like the mouth to work with this, as the logic for that was to some extent dependent on the previous audio properties from the pipeline. I also had to ensure to change the fallback G2P engine for Kokoro from espeak-ng to espeak, as the previous version already used espeak, so this would mean that it would reduce the additional dependencies for the activity.

Working on profanity filters

Profanity filters and child safety measures were of high importance. These filters were implemented and did two checks, one if the child entered something profane and another if the model output was profane, in both cases the output would be intercepted.

Working on the SLM

Most of the challenges of working with the SLM were mentioned above. But choosing a small enough model (0.1B parameter), quantizing it and converting it to GGUF was what I did. Multiple rounds of fine-tuning were done to get it to a level we were happy with. It was pretty interesting to see how far the model had come (in the start the model could hardly understand english grammar)!

Working on the LLM

Now this part was not tricky per se, but it was something that had to be done well and done right. The reason I say this is because this is the first time the organization is trying to use AI into activities. At Sugar we wanted to have a centralized way to integrate AI into different activities. So that's how SugarAI was born.

All the API keys and it's managements would be handled by SugarAI, and the activity would just have to send requests to SugarAI and get responses.
I was responsible for the deployment of SugarAI. SugarAI is hosted on a G5 instance on AWS and runs on a docker container. Nginx was set up and SSL certificates were created too. SugarAI has two parts, one is serving the API and generating responses and other is API key management. I also had to setup nginx and certificates for this.

Other SysOps stuff

During my summer I tried to pickup and learn about the infrastructure of SugarLabs. At one point in the summer we faced an issue with our wiki, the machine running it had issues and there was significant downtime. We needed a way to increase the uptime of the machine. So I suggested moving our nameservers to cloudflare (this was not the only reason we moved to cloudflare, we had other reasons too). Cloudflare offers features like always-online, which essentially serves your site using a backup on the internet archive if the origin is down. This means that when the wiki server is down, the wiki still remains read-only. Sure you can't write to the wiki during this time, but it's still better than having the wiki down.

So this gave me a good learning experience of how nameservers were changed and I also got to talk to some of the amazing people that manage SugarLabs infrastructure (one of them works as a senior engineer at SpaceX! ).

I really liked doing these things alongside my GSoC project, and really got me hooked onto how infrastructure is managed.

Summary

Overall, I really enjoyed my GSoC experience. I learnt a lot, I got to work with amazing people, and I met some really cool people! I also got to work on a project that I was passionate about and that I wanted to contribute to. It was a great experience and I would highly recommend it to anyone who wants to get involved in open source.

If you're interested in learning more about SugarLabs, check out their website. If you want to reach out to me, you can find me on my personal site or on my GitHub.
You can also shoot me a mail at [email protected].